AITopics | Marietta

Collaborating Authors

Marietta

Privacy-Preserving Generative Modeling and Clinical Validation of Longitudinal Health Records for Chronic Disease

Ballyk, Benjamin D., Gupta, Ankit, Konda, Sujay, Subramanian, Kavitha, Landon, Chris, Naseer, Ahmed Ammar, Maierhofer, Georg, Swaminathan, Sumanth, Venkateshwaran, Vasudevan

arXiv.org Machine LearningDec-2-2025

Data privacy is a critical challenge in modern medical workflows as the adoption of electronic patient records has grown rapidly. Stringent data protection regulations limit access to clinical records for training and integrating machine learning models that have shown promise in improving diagnostic accuracy and personalized care outcomes. Synthetic data offers a promising alternative; however, current generative models either struggle with time-series data or lack formal privacy guaranties. In this paper, we enhance a state-of-the-art time-series generative model to better handle longitudinal clinical data while incorporating quantifiable privacy safeguards. Using real data from chronic kidney disease and ICU patients, we evaluate our method through statistical tests, a Train-on-Synthetic-Test-on-Real (TSTR) setup, and expert clinical review. Our non-private model (Augmented TimeGAN) outperforms transformer- and flow-based models on statistical metrics in several datasets, while our private model (DP-TimeGAN) maintains a mean authenticity of 0.778 on the CKD dataset, outperforming existing state-of-the-art models on the privacy-utility frontier. Both models achieve performance comparable to real data in clinician evaluations, providing robust input data necessary for developing models for complex chronic conditions without compromising data privacy.

dataset, synthetic longitudinal ehr, timegan, (13 more...)

arXiv.org Machine Learning

2512.00434

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(11 more...)

Genre:

Research Report > Experimental Study (0.66)
Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Therapeutic Area > Nephrology (0.88)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.64)

Add feedback

Macroscopic Emission Modeling of Urban Traffic Using Probe Vehicle Data: A Machine Learning Approach

Adlouni, Mohammed Ali El, Jin, Ling, Xu, Xiaodan, Spurlock, C. Anna, Lazar, Alina, Sadabadi, Kaveh Farokhi, Amirgholy, Mahyar, Asudegi, Mona

arXiv.org Artificial IntelligenceNov-13-2025

Urban congestions cause inefficient movement of vehicles and exacerbate greenhouse gas emissions and urban air pollution. Macroscopic emission fundamental diagram (eMFD)captures an orderly relationship among emission and aggregated traffic variables at the network level, allowing for real-time monitoring of region-wide emissions and optimal allocation of travel demand to existing networks, reducing urban congestion and associated emissions. However, empirically derived eMFD models are sparse due to historical data limitation. Leveraging a large-scale and granular traffic and emission data derived from probe vehicles, this study is the first to apply machine learning methods to predict the network wide emission rate to traffic relationship in U.S. urban areas at a large scale. The analysis framework and insights developed in this work generate data-driven eMFDs and a deeper understanding of their location dependence on network, infrastructure, land use, and vehicle characteristics, enabling transportation authorities to measure carbon emissions from urban transport of given travel demand and optimize location specific traffic management and planning decisions to mitigate network-wide emissions.

artificial intelligence, emission, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2511.08722

Country:

North America > United States > New York (0.05)
North America > United States > California > Alameda County > Berkeley (0.05)
North America > United States > Texas (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Energy (1.00)
Transportation > Infrastructure & Services (0.69)
Government > Regional Government > North America Government > United States Government (0.48)
Transportation > Ground > Road (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Automated Thematic Analyses Using LLMs: Xylazine Wound Management Social Media Chatter Use Case

Hairston, JaMor, Ranjan, Ritvik, Lakamana, Sahithi, Spadaro, Anthony, Bozkurt, Selen, Perrone, Jeanmarie, Sarker, Abeed

arXiv.org Artificial IntelligenceOct-16-2025

Background Large language models (LLMs) face challenges in inductive thematic analysis, a task requiring deep interpretive and domain-specific expertise. We evaluated the feasibility of using LLMs to replicate expert-driven thematic analysis of social media data. Methods Using two temporally non-intersecting Reddit datasets on xylazine (n=286 and n=686, for model optimization and validation, respectively) with twelve expert-derived themes, we evaluated five LLMs against expert coding. We modeled the task as a series of binary classifications, rather than a single, multi-label classification, employing zero-, single-, and few-shot prompting strategies and measuring performance via accuracy, precision, recall, and F1-score. Results On the validation set, GPT-4o with two-shot prompting performed best (accuracy: 90.9%; F1-score: 0.71). For high-prevalence themes, model-derived thematic distributions closely mirrored expert classifications (e.g., xylazine use: 13.6% vs. 17.8%; MOUD use: 16.5% vs. 17.8%). Conclusions Our findings suggest that few-shot LLM-based approaches can automate thematic analyses, offering a scalable supplement for qualitative research. Keywords: thematic analysis, large language models, natural language processing, qualitative analysis, social media, prompt engineering, public health

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1093/jamiaopen/ooaf102

2507.10803

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > New Jersey > Essex County > Newark (0.04)
North America > United States > Georgia > Cobb County > Marietta (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Consumer Health (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Interpretable Machine Learning for Cognitive Aging: Handling Missing Data and Uncovering Social Determinant

Mao, Xi, Wang, Zhendong, Li, Jingyu, Mao, Lingchao, Essien, Utibe, Wang, Hairong, Ni, Xuelei Sherry

arXiv.org Artificial IntelligenceOct-14-2025

Early detection of Alzheimer's disease (AD) is crucial because its neurodegenerative effects are irreversible, and neuropathologic and social-behavioral risk factors accumulate years before diagnosis. Identifying higher-risk individuals earlier enables prevention, timely care, and equitable resource allocation. We predict cognitive performance from social determinants of health (SDOH) using the NIH NIA-supported PREPARE Challenge Phase 2 dataset derived from the nationally representative Mex-Cog cohort of the 2003 and 2012 Mexican Health and Aging Study (MHAS). Data: The target is a validated composite cognitive score across seven domains-orientation, memory, attention, language, constructional praxis, and executive function-derived from the 2016 and 2021 MHAS waves. Predictors span demographic, socioeconomic, health, lifestyle, psychosocial, and healthcare access factors. Methodology: Missingness was addressed with a singular value decomposition (SVD)-based imputation pipeline treating continuous and categorical variables separately. This approach leverages latent feature correlations to recover missing values while balancing reliability and scalability. After evaluating multiple methods, XGBoost was chosen for its superior predictive performance. Results and Discussion: The framework outperformed existing methods and the data challenge leaderboard, demonstrating high accuracy, robustness, and interpretability. SHAP-based post hoc analysis identified top contributing SDOH factors and age-specific feature patterns. Notably, flooring material emerged as a strong predictor, reflecting socioeconomic and environmental disparities. Other influential factors, age, SES, lifestyle, social interaction, sleep, stress, and BMI, underscore the multifactorial nature of cognitive aging and the value of interpretable, data-driven SDOH modeling.

artificial intelligence, flooring material, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.10952

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Georgia > Cobb County > Marietta (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Evaluation of Machine and Deep Learning Techniques for Cyclone Trajectory Regression and Status Classification by Time Series Data

Lo, Ethan Zachary, Lo, Dan Chie-Tien

arXiv.org Artificial IntelligenceSep-30-2025

Abstract--Accurate cyclone forecasting is essential for minimizing loss of life, infrastructure damage, and economic disruption. Traditional numerical weather prediction models, though effective, are computationally intensive and prone to error due to the chaotic nature of atmospheric systems. This study proposes a machine learning (ML) approach to forecasting tropical cyclone trajectory and status using time series data from the National Hurricane Center, including recently added best track wind radii. A two-stage ML pipeline is developed: a regression model first predicts cyclone features--maximum wind speed, minimum pressure, trajectory length, and directional change--using a sliding window of historical data. These outputs are then input into classification models to predict the cyclone's categorical status. Gradient boosting regression and three classifiers--random forest (RF), support vector machine (SVM), and multi-layer perceptron (MLP)--are evaluated. After hyperparameter tuning and synthetic minority oversampling (SMOTE), the RF classifier achieves the highest performance with 93% accuracy, outperforming SVM and MLP across precision, recall, and F1 score. The RF model is particularly robust in identifying minority cyclone statuses and minimizing false negatives. Regression results yield low mean absolute errors, with pressure and wind predictions within 2.2 mb and 2.4 kt, respectively. These findings demonstrate that ML models, especially ensemble-based classifiers, offer an effective, scalable alternative to traditional forecasting methods, with potential for real-time cyclone prediction and integration into decision-support systems.

artificial intelligence, cyclone, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.24146

Country:

North America > The Bahamas (0.14)
North America > United States > Georgia > Cobb County > Marietta (0.04)
North America > United States > Florida > Escambia County > Pensacola (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.46)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficient Hate Speech Detection: Evaluating 38 Models from Traditional Methods to Transformers

Abusaqer, Mahmoud, Saquer, Jamil, Shatnawi, Hazim

arXiv.org Artificial IntelligenceSep-19-2025

The proliferation of hate speech on social media necessitates automated detection systems that balance accuracy with computational efficiency. This study evaluates 38 model configurations in detecting hate speech across datasets ranging from 6.5K to 451K samples. We analyze transformer architectures (e.g., BERT, RoBERTa, Distil-BERT), deep neural networks (e.g., CNN, LSTM, GRU, Hierarchical Attention Networks), and traditional machine learning methods (e.g., SVM, CatBoost, Random Forest). Our results show that transformers, particularly RoBERTa, consistently achieve superior performance with accuracy and F1-scores exceeding 90%. Among deep learning approaches, Hierarchical Attention Networks yield the best results, while traditional methods like CatBoost and SVM remain competitive, achieving F1-scores above 88% with significantly lower computational costs. Additionally, our analysis highlights the importance of dataset characteristics, with balanced, moderately sized unprocessed datasets outperforming larger, preprocessed datasets. These findings offer valuable insights for developing efficient and effective hate speech detection systems.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3696673.3723061

2509.14266

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
(23 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

A Survey: Towards Privacy and Security in Mobile Large Language Models

Xu, Honghui, Li, Kaiyang, Chen, Wei, Zheng, Danyang, Li, Zhiyuan, Cai, Zhipeng

arXiv.org Artificial IntelligenceSep-3-2025

--Mobile Large Language Models (LLMs) are revolutionizing diverse fields such as healthcare, finance, and education with their ability to perform advanced natural language processing tasks on-the-go. However, the deployment of these models in mobile and edge environments introduces significant challenges related to privacy and security due to their resource-intensive nature and the sensitivity of the data they process. This survey provides a comprehensive overview of privacy and security issues associated with mobile LLMs, systematically categorizing existing solutions such as differential privacy, federated learning, and prompt encryption. Furthermore, we analyze vulnerabilities unique to mobile LLMs, including adversarial attacks, membership inference, and side-channel attacks, offering an in-depth comparison of their effectiveness and limitations. T o bridge this gap, we propose potential applications, discuss open challenges, and suggest future research directions, paving the way for the development of trustworthy, privacy-compliant, and scalable mobile LLM systems. The advent of mobile Large Language Models (LLMs) represents a significant milestone in the evolution of AI, enabling advanced natural language processing capabilities to be deployed in mobile and edge environments [1]-[3]. By bringing powerful AI tools closer to end-users, mobile LLMs are revolutionizing industries such as healthcare [4], finance [5], and education [6] with real-time, on-device applications.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.02411

Country:

North America > United States > Massachusetts (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Georgia > Cobb County > Marietta (0.04)
(3 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Language Models for Subjective Language Understanding: A Survey

Song, Changhao, Zhang, Yazhou, Gao, Hui, Yao, Ben, Zhang, Peng

arXiv.org Artificial IntelligenceAug-12-2025

Subjective language understanding refers to a broad set of natural language processing tasks where the goal is to interpret or generate content that conveys personal feelings, opinions, or figurative meanings rather than objective facts. With the advent of large language models (LLMs) such as ChatGPT, LLaMA, and others, there has been a paradigm shift in how we approach these inherently nuanced tasks. In this survey, we provide a comprehensive review of recent advances in applying LLMs to subjective language tasks, including sentiment analysis, emotion recognition, sarcasm detection, humor understanding, stance detection, metaphor interpretation, intent detection, and aesthetics assessment. We begin by clarifying the definition of subjective language from linguistic and cognitive perspectives, and we outline the unique challenges posed by subjective language (e.g. ambiguity, figurativeness, context dependence). We then survey the evolution of LLM architectures and techniques that particularly benefit subjectivity tasks, highlighting why LLMs are well-suited to model subtle human-like judgments. For each of the eight tasks, we summarize task definitions, key datasets, state-of-the-art LLM-based methods, and remaining challenges. We provide comparative insights, discussing commonalities and differences among tasks and how multi-task LLM approaches might yield unified models of subjectivity. Finally, we identify open issues such as data limitations, model bias, and ethical considerations, and suggest future research directions. We hope this survey will serve as a valuable resource for researchers and practitioners interested in the intersection of affective computing, figurative language processing, and large-scale language models.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.07959

Country:

Asia > China > Tianjin Province > Tianjin (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Michigan (0.04)
(7 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)
Leisure & Entertainment (0.92)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exoplanet Detection Using Machine Learning Models Trained on Synthetic Light Curves

Lo, Ethan, Lo, Dan C.

arXiv.org Artificial IntelligenceJul-29-2025

With manual searching processes, the rate at which scientists and astronomers discover exoplanets is slow because of inefficiencies that require an extensive time of laborious inspections. In fact, as of now there have been about only 5,000 confirmed exoplanets since the late 1900s. Recently, machine learning (ML) has proven to be extremely valuable and efficient in various fields, capable of processing massive amounts of data in addition to increasing its accuracy by learning. Though ML models for discovering exoplanets owned by large corporations (e.g. NASA) exist already, they largely depend on complex algorithms and supercomputers. In an effort to reduce such complexities, in this paper, we report the results and potential benefits of various, well-known ML models in the discovery and validation of extrasolar planets. The ML models that are examined in this study include logistic regression, k-nearest neighbors, and random forest. The dataset on which the models train and predict is acquired from NASA's Kepler space telescope. The initial results show promising scores for each model. However, potential biases and dataset imbalances necessitate the use of data augmentation techniques to further ensure fairer predictions and improved generalization. This study concludes that, in the context of searching for exoplanets, data augmentation techniques significantly improve the recall and precision, while the accuracy varies for each model.

artificial intelligence, exoplanet, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2507.1952

Country:

North America > United States > Georgia > Cobb County > Marietta (0.04)
Europe > Spain (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Space Agency (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

CA-Cut: Crop-Aligned Cutout for Data Augmentation to Learn More Robust Under-Canopy Navigation

Mamo, Robel, Choi, Taeyeong

arXiv.org Artificial IntelligenceJul-25-2025

State-of-the-art visual under-canopy navigation methods are designed with deep learning-based perception models to distinguish traversable space from crop rows. While these models have demonstrated successful performance, they require large amounts of training data to ensure reliability in real-world field deployment. However, data collection is costly, demanding significant human resources for in-field sampling and annotation. To address this challenge, various data augmentation techniques are commonly employed during model training, such as color jittering, Gaussian blur, and horizontal flip, to diversify training data and enhance model robustness. In this paper, we hypothesize that utilizing only these augmentation techniques may lead to suboptimal performance, particularly in complex under-canopy environments with frequent occlusions, debris, and non-uniform spacing of crops. Instead, we propose a novel augmentation method, so-called Crop-Aligned Cutout (CA-Cut) which masks random regions out in input images that are spatially distributed around crop rows on the sides to encourage trained models to capture high-level contextual features even when fine-grained information is obstructed. Our extensive experiments with a public cornfield dataset demonstrate that masking-based augmentations are effective for simulating occlusions and significantly improving robustness in semantic keypoint predictions for visual navigation. In particular, we show that biasing the mask distribution toward crop rows in CA-Cut is critical for enhancing both prediction accuracy and generalizability across diverse environments achieving up to a 36.9% reduction in prediction error. In addition, we conduct ablation studies to determine the number of masks, the size of each mask, and the spatial distribution of masks to maximize overall performance.

artificial intelligence, ca-cut, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2507.17727

Country:

North America > United States > Indiana (0.04)
North America > United States > Illinois (0.04)
North America > United States > Georgia > Cobb County > Marietta (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback